collaborative planning and reinforcement learning
CO-PILOT: COllaborative Planning and reInforcement Learning On sub-Task curriculum
Goal-conditioned reinforcement learning (RL) usually suffers from sparse reward and inefficient exploration in long-horizon tasks. Planning can find the shortest path to a distant goal that provides dense reward/guidance but is inaccurate without a precise environment model. We show that RL and planning can collaboratively learn from each other to overcome their own drawbacks. In ''CO-PILOT'', a learnable path-planner and an RL agent produce dense feedback to train each other on a curriculum of tree-structured sub-tasks. Firstly, the planner recursively decomposes a long-horizon task to a tree of sub-tasks in a top-down manner, whose layers construct coarse-to-fine sub-task sequences as plans to complete the original task.
artificial intelligence, collaborative planning and reinforcement learning, machine learning, (7 more...)
Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)